Week 1 of 12 · Part A — Applied Safety

What AI Safety Actually Is

Not a worry, a discipline — finding how a deployed model fails before its users do

Day 1 ~75 minutes Concept

Day 1 of 60

The one idea this whole track rests on

"AI safety" sounds like a feeling — a vague unease about powerful technology. It isn't. It's an engineering discipline with its own methods, metrics, and artifacts. Its job is concrete: given a model and how it's deployed, find the ways it can fail, misbehave, or be misused — before a user, an attacker, or the public finds them for you — and then measure and reduce those failures.

The thesis

A capable model is not automatically a safe one. Capability and safety are different properties, measured differently and earned separately. The people who can systematically find, measure, and reduce a system's failures are the bottleneck on whether it can be responsibly deployed. This track makes you one of those people.

Over 12 weeks you'll move through three layers of the field: applied safety (the hands-on craft of taxonomies, red-teaming, and evaluation), alignment research literacy (the deeper question of why capable systems can pursue the wrong goal), and governance (the frameworks and regulation that turn testing into accountability). Today we install the map so everything later has a place to hang.

Three kinds of risk — keep these separate

Almost every AI safety concern is one of three types. Conflating them is the most common rookie mistake; a practitioner names which one they're talking about.

Core Theory

1 · Misuse — a person uses the model to cause harm

The model works as intended, but a human points it at something harmful: generating disallowed content, assisting an attack, harassing someone. The failure is in who's using it and how. Defenses: content policy, refusals, monitoring, access controls.

2 · Accidents — the model itself behaves in unintended ways

No bad actor required. The model does something its designers didn't want — a harmful output on a benign prompt, a confidently wrong medical answer, an agent that takes a destructive action. The failure is in the system's own behavior. Defenses: alignment techniques, robustness, evaluation.

3 · Systemic / structural — harms that emerge from deployment at scale

No single output is the problem; the aggregate is — concentration of power, erosion of trust, labor effects, an AI race that cuts safety corners. The failure is in the broader system the model is embedded in. Defenses: governance, policy, institutional design.

Read this back to yourself

Misuse is about users. Accidents are about the model. Systemic risk is about the world the model is deployed into. Different causes, different defenses. When someone says "AI is dangerous," your first move is to ask: which of the three?

Why this is engineering, not just ethics

Ethics tells you what you should want (don't deploy systems that cause harm). It doesn't tell you whether your system does. That gap — between good intentions and verified behavior — is where safety engineering lives. The seminal paper Concrete Problems in AI Safety (Amodei et al., 2016) made this move explicit, reframing safety as a set of tractable engineering problems: avoiding negative side effects, avoiding reward hacking, safe exploration, and robustness to a shifting world.

That reframing is why you can become an AI safety practitioner without training a single neural network. The skills that make systems safer — writing precise policies, running rigorous red-teams, designing honest evaluations, reasoning about failure — are safety skills, not modeling skills.

What this means for you

You don't need to be an ML researcher to do this work. You need to think adversarially, measure honestly, and communicate clearly about risk. Those are learnable, and they're exactly what this track drills.

The first habit: threat modeling

The single most portable skill in applied safety is threat modeling: given a system, systematically ask what could go wrong, for whom, and how would we catch it? — then rank the answers so you work on what matters. You'll build a real one on Day 3 and reuse the habit for 12 weeks. Today, just internalize the question, because it's the lens the rest of the field is viewed through.

Your work today

Read + Map the Field

~75-minute foundation

Read the intro and the five problems of Concrete Problems in AI Safety (§1–3). For each problem, note which of the three risk types it most relates to.
Skim Unsolved Problems in ML Safety — just the four pillars (Robustness, Monitoring, Alignment, Systemic Safety). This is the skeleton of the whole field.
In a notebook, write one sentence each defining misuse, accident, and systemic risk in your own words, with an example of each.

The full curated, verified resource list for this week is at the bottom of the page — start with the ones marked Start here.

The expert move

An enthusiast says "AI could be dangerous." An expert immediately decomposes: which risk — misuse, accident, or systemic — for whom, and how would we measure it? The altitude jump is from having a concern to having a threat model: a structured, ranked answer you can act on and defend.

Say this in an interview: "I don't treat 'is it safe' as one question. I separate misuse, accidental misbehavior, and systemic harm, because each has different causes, different defenses, and different owners — and I start by threat-modeling the specific deployment, not the technology in the abstract."

Today's Takeaways

AI safety is an engineering discipline — finding, measuring, and reducing failures — not a vibe.
Capability ≠ safety: they're different properties, earned separately.
Three risk types: misuse (users), accidents (the model), systemic (the world). Name which one.
The core habit is threat modeling: what could go wrong, for whom, and how would we catch it?